##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A        658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2        909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D       1074836 2013-12-14 08:26:37.093000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516       0.1580
## 2               36    Current                         0.12016       0.0920
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269       0.2750
## 4               36    Current                         0.12528       0.0974
## 5               36    Current                         0.24614       0.2085
## 6               60    Current                         0.15425       0.1314
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1      0.1380                      NA            NA              NA
## 2      0.0820                 0.07960        0.0249         0.05470
## 3      0.2400                      NA            NA              NA
## 4      0.0874                 0.08490        0.0249         0.06000
## 5      0.1985                 0.18316        0.0925         0.09066
## 6      0.1214                 0.11567        0.0449         0.07077
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                                 NA
## 2                       6                     A            7
## 3                      NA                                 NA
## 4                       6                     A            9
## 5                       3                     D            4
## 6                       5                     B           10
##   ListingCategory..numeric. BorrowerState    Occupation EmploymentStatus
## 1                         0            CO         Other    Self-employed
## 2                         2            CO  Professional         Employed
## 3                         0            GA         Other    Not available
## 4                        16            GA Skilled Labor         Employed
## 5                         2            MN     Executive         Employed
## 6                         1            NM  Professional         Employed
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
## 4                      113                True            False
## 5                       44                True            False
## 6                       82                True            False
##                  GroupKey              DateCreditPulled
## 1                         2007-08-26 18:41:46.780000000
## 2                                   2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4                                   2012-10-22 11:02:32
## 5                                   2013-09-14 18:38:44
## 6                                   2013-12-14 08:26:40
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
## 4                   800                   819     1983-02-28 00:00:00
## 5                   680                   699     2004-02-20 00:00:00
## 6                   740                   759     1973-03-01 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
## 4                  5               5                         29
## 5                 19              19                         49
## 6                 21              17                         49
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
## 4                     7                         115                    0
## 5                     6                         220                    1
## 6                    13                        1410                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
## 4              1                    4            10056
## 5              9                    0                0
## 6              2                    0                0
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
## 4                      14                        0
## 5                       0                        0
## 6                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
## 4                         0                   1444                0.04
## 5                         0                   6193                0.81
## 6                         0                  62999                0.39
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
## 4                   30754          26                               0.76
## 5                     695          39                               0.95
## 6                   86509          47                               1.00
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
## 4                       0              0.15 $25,000-49,999
## 5                       2              0.26      $100,000+
## 6                       0              0.36      $100,000+
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
## 4             True            2875.000 A0393664465886295619C51
## 5             True            9583.333 A180369302188889200689E
## 6             True            8333.333 C3D63702273952547E79520
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
## 4                NA                         NA                    NA
## 5                 1                         11                    11
## 6                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                   0                               0
## 6                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
## 4                       NA                          NA
## 5                    11000                      9947.9
## 6                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
## 4                          NA                         0
## 5                          NA                         0
## 6                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
## 4                            NA                         16      77296
## 5                            NA                          6     102670
## 6                            NA                          3     123257
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
## 4              10000 2012-11-01 00:00:00                Q4 2012
## 5              15000 2013-09-20 00:00:00                Q3 2013
## 6              15000 2013-12-24 00:00:00                Q4 2013
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
## 4 9ADE356069835475068C6D2             321.45             5143.20
## 5 36CE356043264555721F06C             563.97             2819.85
## 6 874A3701157341738DE458F             342.37              679.34
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                      9425.00            1971.14        -133.18
## 2                         0.00               0.00           0.00
## 3                      3001.00            1185.63         -24.20
## 4                      4091.09            1052.11        -108.01
## 5                      1563.22            1256.63         -60.27
## 6                       351.89             327.45         -25.33
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
## 4                 0                     0                   0
## 5                 0                     0                   0
## 6                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
## 4                               0             1               0
## 5                               0             1               0
## 6                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41
## 4                          0                           0       158
## 5                          0                           0        20
## 6                          0                           0         1

Loan data

In this project we are analysing a dataset from the company Prosper, who is part of the peer-to-peer lending industry.

Univariate Plots Section

In this section we will preform prelaiminary exploration of the dataset to get an understanding ot the structure and the indivual variables in the loan dataset.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

The loan orginial amount is the amount that was bid. The median of the loans is 6500. I suggest the money is needed for extra expenses due to unexpected problems, like home improvements or taking a small loan for a holiday. It seems the loans above 25000 are not often needed. So we will perform an outlier check.

## Outliers identified: 4395 nPropotion (%) of outliers: 4 nMean of the outliers: 26253.53 nMean without removing outliers: 8337.01 nMean if we remove outliers: 7618.17 nDo you want to remove outliers and to replace with NA? [yes/no]: 
## Nothing changed n

There are 4395 outliers identified from the 113937 data objects. We will first replace the outliers with NA values and then create a new filtered data frame without the outliers.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   36.00   36.00   40.83   36.00   60.00

The loan taker can choose between a 12, 36 or 60 month long term. In the plot above we can see that most of the loans are taken with a term of 36 month.

To get a better readability, we are going to map the numeric values to better readable strings according to this site: https://www.prosper.com/Downloads/Services/Documentation/ProsperDataExport_Details.html

As we can see in the plot above the most loans are needed for the categories “Debt Consolidation”, “Not Available” and “Other”. So it seems loan takers do not want to tell the purpose of their loans.

We will make another plot where we skip those categories, to take a closer look at the other categories.

## [1] Other         Professional  Other         Skilled Labor Executive    
## [6] Professional 
## 68 Levels:  Accountant/CPA Administrative Assistant Analyst ... Waiter/Waitress

Because there are 68 different types of occupation we are going to combine groups into a new data frame into bigger occupation groups.

## False  True 
## 56459 57478

Almost the same amount of loan takers ar home owners. Because we filtered out the top outliers from the loan original amount, we can see that slightly more non home owners need a loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29084

The plot above shows that there is most of the data for credit grade missing.

We can see that the longer people have an employment the less they need a loan.

## Length  Class   Mode 
##      0   NULL   NULL

The majority of the Investors is just one person. In the plot above, we limit the number of loans given to 760 in order to get a better picture of loans given by more then one investor.

Univariate Analysis

Structure of the dataset

This data set contains 113,937 loans with 82 variables on each loan. Including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information. The explanation of the variables can be found there: https://www.prosper.com/Downloads/Services/Documentation/ProsperDataExport_Details.html

Main features of interest in the dataset

We want to know how much money is needed, when and why. So the most important variables are ‘OriginalLoanAmount’, ‘LoanOriginationDate’ and ‘Category’. We think it is also interesting to see if there is a difference in loan taking between home owners and non home owners. Then we are also interested to see if the credit grade and the prosper score are related to other variables.

More features

Another point of interest is the lender yield. And we are alos curious about the fees that the company takes.

New variables

To provide a better readability we created the variable “Category”, where we mapped the categories to the numbers of the ‘ListingCategory..numeric’. We also created a new variable named ‘GroupedOccupation’. Because the original variable ‘Occupation’ consits of 68 levels, we wanted to group those to get a better overview. The new variable consits of 8 levels representing our occupational groups.

Changed variables

In the LoanOriginalAmount variable we perforemed an outlier check and removed those values in order to make the following analysis more robust.

Bivariate Plots Section

No we want to take a look at the home owners. In the next step we want to proof that people who aren t home owners need more often small loans for vacation, home improvement or household expenses than home owners. Our suggestion that house owners need less loans, seem to be wrong. Almost half the amount of borrowers are house owners. We limit the loan amount to a smaller range, because we want to know if house owners also need smaller loans, for home improvments or others.

In the plot above we can see that house owners need bigger amounts of money than non house owners.

Next we want to see, if there is a relation between the prosper rate and the fact that the borrower is a house owner. Normally a house owner has a better rating, due to more financial security.

Another surprise here. The Prosper score for non home owners is just a little bit lower.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2252.0

In the plot above we can not notice any difference between home owners and non home owners in current delinquencies.

## # A tibble: 8 <U+00D7> 6
##   ProsperRating..numeric. mean_amount median_amount min_amount max_amount
##                     <int>       <dbl>         <dbl>      <int>      <int>
## 1                       1    3463.114          4000       1000      16800
## 2                       2    4586.405          4000       1000      15900
## 3                       3    7083.439          6100       1000      15000
## 4                       4   10391.940         10000       1000      25000
## 5                       5   11622.355         10000       1000      35000
## 6                       6   11459.886         10000       1000      35000
## 7                       7   11583.539         10940       1000      35000
## 8                      NA    6159.303          4500       1000      25000
## # ... with 1 more variables: n <int>

No surprises here, the better the Prosper Score, the better the borrower rate.

We suggest that the prosper rating is better if the income is verifiable. Yes, our suggestion is right.

As we can see in the plot above, the better the credit grade, less delinquenices.

The majority of the loan takers are full time employees. The worse the credit grade gets the more likely the employment status is “not available”. We exclude missing data in our plot, to get a good picture.

We suggest that there is a higher lender yield, if the borrower rate is higher.

Yes, we can see clearly the higher the borrower rate, the higher the lender yield.

Bivariate Analysis

We found out that there is almost no difference between home owners and non home owners by prosper score and delinquencies. There is a slight difference as home owners tend to need bigger loans.

The prosper rating is better if the income is verifiable. The worse the credit grade the more often occured delinquencies in the last 7 years.

We notice a strong relation between the lender yield and the borrower rate. The higher the borrower rate, the better the lender yield.

Multivariate Plots Section

In the plot above we can see that from 2006 to 2010 the loans where taken for 36 month. Maybe back than this was the only term available. From 2010 on if the loan amount was higher borrowers selected the 60 month term. In 2011 the loans increased a lot.

In the factor plot above, we can see that students need the smaller amounts of money. The higher the loan gets, the more homeowners are the borrowers.

In the plot above we can see that most of the loans are mid-term.

In the plot above we can see quite well, that the investors yield gets higher the higher the borrower rate gets.

The plot above gives us a nice overview. As we can see most of the loans are mid term loans with a duration of 36 month. The usages of the loans are well mixed.

The plot above gives an overview of the estimated return and estimated loss by grouped occupation and income range. As there is to much information in the plot, we hardly can see anything. So we are going to split this into two plots.

The plots above are still not readable. So in the next plot we want to focus on on specific occupation group and the estimated losses and returns.

This plot shows that the lower the customer payments are the lower the service fees and interest fees are.

In the plot above we get an overview of the loans by category and grouped occupation. It is quite hard to find a pattern on the first sight, so maybe a normal list would have done a better job.

Multivariate Analysis

Estimated loss and estimated return by income range and grouped occupation

We ploted an overview of the estimated return and the estimated loss by grouped occupation and income range. Because the plot was to dense and unreadable, we split it up into two plots. One for the income range and one for the occupation group. But this plot is not very readable either. So we focused on one occupationl group - the students. This plot is well readable. And it was quite surprising that there is one outlier with a negative estimated return and a really high estimated loss. ### Heatmap of grouped occupation and category by loan original amount It was expected that this heatmap gives a nive overview of the occupations and categories. We were hoping to find patterns with one look. But this is not the case, you have to elaborate this plot. Maybe it would have been better to provide a list in this case. ### Loans and Fees From 2006 to 2010 the loans where taken for 36 month. Maybe back than this was the only term available. From 2010 on if the loan amount was higher borrowers selected the 60 month term. In 2011 the loans increased a lot. We found out that the lower the customer payments are the lower the service fees and interest fees are. Which is not a surprise.

Final Plots and Summary

Loan amount by grouped occupation and homeowner status

Description

In the factor plot above, we can see that students need the smaller amounts of money. Surprisingly there are a lot of home owners in the group student. The higher the loan gets, the more homeowners are the borrowers. In some occupational groups there are a lot more home owners. This may be caused by our occupational grouping, which contains professions with a wide income range. For example in the group ‘Medical_Health’, there are doctors and nurses etc.

Histogram of loan amounts by term and loan status

Description

In the plot above we can see that most of the plots are taken with a terma of 36 month. Even though the amounts are not that big. This may be because until 2009 there where only loans with a term of 36 month. This may also explain why the loan status is completed or charged off for the most of the loans under 5000. We can also see that short term loans are not often used at the moment. People prefer mid or long term loans.

Estimated return and estimated loss for students

Description

We can see that most of the students earn between $1- 24,999. There are some students that a earn more than $75,000. There are some outliers where the estimated loss is higher than the estimated return. The majority of the estimated returns and estimated losses are betw depending on the income range of a student are between 0.05 and 0.15. Surprisingly there is one outlier with a negative estimated return and a really high estimated loss.

Summary

Because of the big amount of variables it took some time, to read through the explanations of the prosper loan data. To get started we explored some different variables. In order to get nice plots, we had to convert some values. For example the origin date to year, the numeric categories into readable categories and the job duration months where summarized in buckets It was interresting to see that from 2011 on borrowers needed higher loans with longer terms. It was quite a surprise that there is no big difference between home owners and non home owners, because we suggested that home owners are financially more strong and don’t need small loans. For the other variables I could not find a lot of surprising facts. For example the worse the credit grade is, the higher the delinquencies in the last 7 years are or that the lender yield gets lower the higher the numbe of investors get. It would be nice if there were not big groups like ‘na’ or ‘other’ in the occupation and category group. Maybe there could be also data about the age and gender of the borrower provided, which may lead to interesting findings.